Feature/tighten hash usage (fixing Issues #316 and #423) #441

JoshuaGreen · 2024-01-28T21:38:40Z

I think it's finally time to throw this monstrosity to the wolves so that it can receive some impartial review.

My original modest goal was to resolve my personal Issue #316 that likely bothered no one else. The standard dhtElement type stored Key and Data as dhtConstValues which were typedefed as void const *s, but we apparently wanted to store data_types (i.e., unsigned ints) natively. The union aimed to treat the memory holding a void const * as instead holding a data_type, but such punning left me squeamish and my attempt to simply cast back and forth between pointers and integers didn't obviously work. I therefore went with a different idea, namely to make Key and Data themselves unions. These unions can (I believe) store any fundamental type natively, though I leave floating point as a compile-time option that we don't make use of.

Making the above change required me to update the various hash data types accordingly. I tightened a number of their operations, allowing them to handle more cases correctly and catch more bugs via asserts. I also made the code more portable.

I was pretty satisfied with the above, but then Pull Request #360 came in which described Issue #423. The suggested fix there was to increase hashed buffer sizes, and without doing a deep analysis of the Issue I could see that that could be necessary. However, I was skeptical of the arbitrary nature of the new size, and I wanted to determine the actual size needed. As this seemed related to hash operations, I worked on that update here. My results can be seen in optimisations/hash.h, and I invite scrutiny of my calculations there. (In particular, I'm still just guessing on MAX_EN_PASSANT_TOP_DIFFERENCE, and the right value there could be smaller [or conceivably larger].)

I thought that the above would fix Issue #423, but testing showed that I also needed to increase MaxPieceId. That update has nothing to do with hashing, but I went ahead and made it. I think I've fixed whatever relied on MaxPieceId being only 63, but please double-check this.

It seemed that this should all have been sufficient, but further testing yielded surprising SEGFAULTs. It took a lot of investigation to sort this out, and I found that the issue came down to alignment. Specifically, fxfAlloc didn't really enforce data type alignment, despite comments suggesting otherwise. This wasn't a problem when our allocated objects only held unsigned chars and similar, but as we were upgrading one element to an unsigned short (to solve Issue #423) this was becoming a problem. I therefor dug into the fxf infrastructure, figuring out how it operates and modifying it to respect alignment requirements. I think I got this right, but this could also use a lot of testing on different systems.

While working on the above, I in some sense came full circle and began to worry about strict aliasing violations. I asked about one usage in this StackOverflow question, and the responses—especially this linked question—suggested that I was right to be worried. I therefore modified my implementation, doing my best to avoid assigning an obvious type to the pointers. This seems to work in practice, though we may want to consider compiler options like -fno-strict-aliasing and/or -fno-ipa-strict-aliasing (where available) to be safe. (Only testing can determine where or if they're necessary and what performance impacts they may have. Unfortunately, I haven't found any documentation that would allow me to say which one(s) would be most appropriate.)

With all of the above changes, testing on the non-lengthy examples seems to work fine. I do see small differences in a few files, but they all have relatively large numbers of hash table accesses, so my assumption is that these are cases where we're running out of hash space. Since I changed the allocation logic it's not so surprising that we'd see some differences when that happens, but notably the new numbers are sometimes better and sometimes worse than the previous ones.

…o defines a hash value and equality.

…ser.

…tside of bounds.

…n achieve the same by adjusting hashbuf_length accordingly.

…e we have.

…hes the format specifier.

…SEGFAULTing on the first EXAMPLES problem.

long double floating_point; since fxfAlloc (apparently) can't handle such allocations.

…(or anything else that may come along, presumably).

…INSIZE), allowing access to that value from outside FXF, and carefully storing whatever we can before moving on to the next segment.

…t an empty translation unit that we get when not using FXF.

…s otherwise size would have necessarily fit).

…ant.

JoshuaGreen · 2024-05-07T01:34:07Z

NOTE: The calculations in optimisations/hash.h don't obviously match the computations in optimisations/hash.c from Commit 5ffb4723d8c608553ecaf6d15b114883f7b7d544. If we go forward with this pull request then the calculations will probably have to be corrected.

JoshuaGreen · 2024-05-14T23:12:23Z

I'm currently updating the calculations in optimisations/hash.h to match what's now happening in optimisations/hash.c.

MuTsunTsai · 2024-05-15T05:19:15Z

Really looking forward for this PR to be merged. This solves many known issues, including the one described in #360.

JoshuaGreen · 2024-05-19T23:42:47Z

I'm closing this PR as the updates to develop have left this branch behind. Please consider PR #493 instead.

JoshuaGreen added 30 commits September 2, 2023 13:33

Attempting to improve the DHT abstraction.

9b5d1b7

Adding some important documentation.

25a9e54

Simplifying things -- dhtKey IS a dhtValue, specifically one that als…

0f1f1af

…o defines a hash value and equality.

Might as well eliminate a compiler warning while we're at it.

ef64367

Making cppcheck happier.

2b4c821

I guess this is more generic.

258f055

Since we want everything to be 0, we don't need the explicit initiali…

1958bb2

…ser.

Prefer unsigned integers.

3fe00c7

Fixing some cppcheck style warnings.

f6d1f70

Fixing some typos.

a5f410f

Fixing some more typos.

de4eeaf

Beginning to track dimensions so we can ensure that we don't write ou…

3fb8b38

…tside of bounds.

Matching style.

9518402

Lots of simplifications, plus other changes.

427e8f7

Restoring the union as there was no strong reason to remove it. We ca…

d5dcfb6

…n achieve the same by adjusting hashbuf_length accordingly.

Comparing the right upper bound, the one that indicates how much spac…

488b13b

…e we have.

The Leng member is an unsigned (short) int, so cast to ensure it matc…

1967966

…hes the format specifier.

Let's just use the right format specifier.

f95c1cd

Fixing some issues found by VSCode.

7786ef5

Oops, forgot a term.

591df89

Those functions can be static.

87c28c9

Trying to compute the needed size at compile-time.

1159ebe

Fixing some typos, allowing NULL pointers to be duplicated.

df21739

Holding back on expanding the buffers until I figure out why this is …

c4ef24d

…SEGFAULTing on the first EXAMPLES problem.

Removing

69c6c37

long double floating_point; since fxfAlloc (apparently) can't handle such allocations.

Trying to make uses of tabs consistent.

f8ab1c3

Bumping the required alignment; this allows us to store long doubles …

14daa9e

…(or anything else that may come along, presumably).

If the board is filled then we need 64 piece IDs.

300ef7f

Going back to assertions.

19c2284

Some reformatting.

cedb9af

JoshuaGreen added 22 commits November 25, 2023 20:19

Probably improving this bound.

9f202a4

Asserting on one of our assumptions.

7be78f3

Correcting the constant's name.

facb5e7

Ensuring we have the constants we need.

bece590

Shifting the sanity check to the .c file.

23005b1

Changing this isn't part of the branch.

ec2e09c

Adding a TODO.

5c885c3

Adding some useful comments, making fxfMAXSIZE an enum (to match fxfM…

454665c

…INSIZE), allowing access to that value from outside FXF, and carefully storing whatever we can before moving on to the next segment.

Always including the corresponding header, eliminating a warning abou…

4cbce9e

…t an empty translation unit that we get when not using FXF.

We don't need a loop since we already know that we're < fxfMAXSIZE (a…

8b56652

…s otherwise size would have necessarily fit).

Replacing two checks with a single check against a compile-time const…

fe4a949

…ant.

Using a different hack to eliminate the warning.

39e2827

Replacing a complicated expression with a macro.

bcab92b

Ensuring that these values stay aligned.

02c90db

Merging the enums to eliminate a compiler warning.

7f6c53a

Things simplify if we assume that fxfMINSIZE > 0, so let's do that.

8f7a900

Adding the buffer option.

3eaa9d1

Choosing a more descriptive name.

ab293f8

We operate on dhtKeys through their values, so make that clear.

8d84958

Restoring an assert.

bcc1958

Keeping the original format of the assert.

be528c5

Marking the new assert as a TODO.

61f3233

JoshuaGreen mentioned this pull request May 13, 2024

Increase hashbuf_length #360

Closed

This was referenced May 19, 2024

Positions with 64 pieces cause overflows #492

Closed

Feature/tighten hash usage 2 (fixing Issues #316, #423, and probably #492) #493

Open

JoshuaGreen closed this May 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/tighten hash usage (fixing Issues #316 and #423) #441

Feature/tighten hash usage (fixing Issues #316 and #423) #441

JoshuaGreen commented Jan 28, 2024

JoshuaGreen commented May 7, 2024

JoshuaGreen commented May 14, 2024

MuTsunTsai commented May 15, 2024

JoshuaGreen commented May 19, 2024

Feature/tighten hash usage (fixing Issues #316 and #423) #441

Feature/tighten hash usage (fixing Issues #316 and #423) #441

Conversation

JoshuaGreen commented Jan 28, 2024

JoshuaGreen commented May 7, 2024

JoshuaGreen commented May 14, 2024

MuTsunTsai commented May 15, 2024

JoshuaGreen commented May 19, 2024